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Abstract. We present rigorous mathematical analyses of a number of well-known mathemat- 
ical models for genetic mutations. In these models, the genome is represented by a vertex of 
the n-dimensional binary hypercube, for some n, a mutation involves the flipping of a single 
bit, and each vertex is assigned a real number, called its fitness, according to some rules. Our 
main concern is with the issue of existence of accessible paths, that is, monotonic paths across 
the hypercube along which fitness is always increasing. Our main results resolve open ques- 
tions about three such models, which in the biophysics literature are known as House of Cards 

■ (HoC), Constrained House of Cards (CHoC) and Rough Mount Fuji (RMF). We prove that the 
_ probability of there being at least one (selectively) accessible path tends respectively to 0, 1 

and 1, as n tends to infinity. A crucial idea is the introduction of a generalisation of the CHoC 
O ' model, in which the fitness of the all-zeroes node is set to some a = oin £ [0,1]. We prove that 

, there is a very sharp threshold at a„ — for the existence of accessible paths. As a corollary 

we prove significant concentration, for oi below the threshold, of the number of accessible paths 
about the expected value (the precise statement is technical, see Corollary II. 4|) . In the case of 
RMF, we prove that the probability of accessible paths existing tends to I provided the drift 
parameter 9 = On satisfies nOn — ^ oo, and for any fitness distribution which is continuous on its 
' support and whose support is connected. 

Oh' 

■ 0. Notation 

Let g^h : N — t- M4- be any two functions. We will employ the following notations throughout, 
, all of which are quite standard: 

> ; (i) g{n) ~ h{n) means that lim„^oo = 1- 

Q>^ ! (ii) g(n) < h{n) means that limsup^^oo < 1- 

\ (iii) g{n) > h{n) means that h{n) < g{n). 

^ \ (iv) g{n) = 0{h{n)) means that limsup„_^oo < 00. 

(v) g{n) = Q{h{n)) means that h{n) = 0{g{n)). 
CN ■ (vi) g{n) = Q{h{n)) means that both g{n) = 0{h{n)) and h{n) = 0{g{n)) hold, 

(vii) g{n) = o{h{n)) means that lim^^oo = 0. 
If g,h are instead random variables, we use the above notations when the corresponding rela- 
^ ■ tionships hold with probability tending to 1 as n — )• 00. More precisely, f{n) ~ g{n) for example 

^ . means that, for all ei, £2 > and n ^ 0, 

(0.1) P( 1-ei < ^ < 1 + ) > l-e2. 

1. Introduction 

In many basic mathematical models of genetic mutations, the genome is represented as a node 
of the n-dimensional binary hypercube Q„ and each mutation involves the flipping of a single 
bit, hence displacement along an edge of Q^- Each node v G is assigned a real number /(f), 
called its fitness. The fitness of a node is not a constant, but is drawn from some probability 
distribution specified by the model. This distribution may vary from node to node in more or 
less complicated ways, depending on the model. Basically, however, evolution is considered as 
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favoring mutational pathways which, on average, lead to higher fitness. A fundamental concept 
in this regard is the foUowing (see |W2j . [Wlj . |FK VK] ) : 



Definition 1.1. Let / : Q„ — )■ Q„ be a fitness function. A (selectively) accessible path is 
a path in Q„ 

(1.1) Vq^ Vi^ > Vn-l Vn, 

such that 

(i) vq and Vn are a pair of antipodal nodes, 

(ii) f{vi) > f{vi-i) for i = l,...,n. 

A basic question in such models is whether accessible paths exist or not with high proba- 
bility. We shall be concerned below with the following three well-known models, in which no 
rigorous answer to this question has previously been given. Let = (0,0,. ..,0), = (1,1,. ..,1) 
denote the all-zeroes and all-ones vertices in respectively. 

Model 1: Unconstrained House of Cards (HoC) 

This model is originally attributed to Kingman jKij . In the form we consider below, it was 
first studied by Kauffmann and Levin |KLj . We set f{v^) '■= 1 and, for every other node 
V £ Qn, let f{v) ~ J7(0,l), the uniform distribution on the interval [0,1]. 

Model 2: Constrained House of Cards (CHoC) 

This variant seems to have been considered only more recently, see for example |Kloj and |CHj . 
The only difference from Model 1 is that we fix f{v^) := 0. 

Model 3: Rough Mount Fuji (RMF) 

This model was first proposed in [X], see also |FWKj . It includes two parameters, a fixed 
probability distribution t], and a positive number 6, called the drift, which may depend on the 
dimension n. For each v G one lets 

(1.2) f{v) = e-d{v,v^) + r]{v). 

In other words, one first assigns a fitness to each node at random, according to rj, and indepen- 
dent of all other nodes. Then the fitness of each node is shifted upwards by a fixed multiple of 
the Hamming distance from v^. 

In all three models, the basic random variable of interest is the number X = X(n) of ac- 
cessible paths from to v^. One thinks of as the starting point of some evolutionary 
process, and as the desirable endpoint. The HoC model is often referred to as a "null model" 
for evolution, since the fitnesses of all nodes other than are assigned at random and indepen- 
dently of one another. No mechanism is prescribed which might push an evolutionary process 
in any particular direction. The CHoC model is not much better, though it does specify that 
the starting point is a global fitness minimum. The RMF model is a very natural, and simple, 
way to introduce an "arrow of evolution", since the drift factor implies that successive — >• 1 
mutations will tend to increase fitness. 

It seems intuitively obvious that the number X of accessible paths should, on average, be 
much higher in RMF than in HoC. One should be a little careful here, since in RMF, the node 
is not assumed to be a global fitness maximum. Nevertheless, simulations reported in the 
biophysics literature support this intuition. In particular, let P = P{n) be the probability of 
there being at least one accessible path, i.e.: P = ¥(X > 0). In [FKVKj it was conjectured 
explicitly that P — ?• in the HoC model, and that P — t- 1 in the RMF model, when rj is 
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a normal distribution and is any positive constant. It also seems intuitively clear that the 
CHoC model lies somewhere in between. In jCH| this model was simulated for n < 13, and the 
authors conjecture, if somewhat implicitly, that P is monotonic decreasing in n and approaches 
a limiting value close to 0.7. In |FKVKj . simulations were continued up to n = 19 and these 
indicated clearly that P was not, after all, monotonic decreasing. The authors abstain from 
making any explicit conjecture about the limiting behaviour of P. 

Our main results below resolve all these issues. A crucial idea is to consider the following 
slight generalisation of the CHoC model: 



Model 4: a-CoNSXRAiNED House of Cards (a-HoC) 



Let a E [0,1]. In this model, fitnesses are assigned as in the CHoC model, with the excep- 
tion that we set f{v^) := a. Hence, CHoC is the case a = 0. 

Let P{n,a) denote the probability of there being an accessible path in the a-constrained HoC 
model on the n-hypercube. Note that P{n, a) decreases as a increases. Our first main result is 
the following: 

Theorem 1.2. Let e = e„ > 0. If nSn — oo then 

(1.3) lim P (n, — e„ 

n^oo y n 

and 

(In n \ 
n, + e„ = 0. 
n J 

It follows immediately that P — )■ 1 in the CHoC model. The above result says a lot more, 
however. It shows that there is a very sharp threshold at a = = for the existence of 
accessible paths in the a-HoC model. Theorem 11.21 will be proven in Section [2l We have the 
following immediate corollary for HoC: 

Corollary 1.3. Let X denote the number of accessible paths in the unconstrained House of 
Cards (HoC) model. Then 

^ , s Inn 

(1.5 F(X>0) . 

n 

Proof. As P{n,a) is decreasing in a we know that for any a £ [0, 1], ¥ {X > 0) > aP{n,a). 
Picking a = — En where ne„ tends to infinity sufficiently slowly yields P {X > 0) > . 
To get the upper bound, let a = Now, if the hypercube has accessible paths, then either 
has fitness at most a or there is an accessible path where all nodes involved have fitness at 
least a. Obviously the former event occurs with probability a. Concerning the latter, if 

(1.6) 1;° —)■ ui —)■••• —)• Vn-i — > 

is any path, then the probability of all nodes along it having fitness at least a is (1 — a)". The 
probability of fitness being increasing along the path is 1/n!. Since there are n! possible paths 
of the form (|1.6p , it follows from a simple union bound that 



^/ s . (1 — a)" Inn 1 

(1.7 F(X>0)<a + nl^ < + -. 

n! n n 



Another Corollary of Theorem 1 1 . 2 1 concerns the distribution of the number of accessible paths 
in the a-HoC for a = — e„ where ne„ — t- oo. It is straightforward to show that the expected 
number of paths in the a-HoC model is n(l — a)""^ (see Proposition 12. ip . which for this choice 
of Q is ~ e""^". We have the following result: 



Corollary 1.4. Let X denote the number of accessible paths in the a- constrained House of 
Cards (a-HoC) model for a = — e„ where nSn — ?• oo. If Wn — )• oo then 

(1.8) lim P f — E[X1 < X < w„E[Xl ) = 1. 

n->oo \Wn J 

Corollary 11.41 will be proven in Subsection 12.51 

Our second main result concerns the RMF model. For any function / : M — t- M, recall 
that the support of /, denoted Supp(/), is the set of points at which / is non-zercQ, i.e.: 
Supp(/) = {x : f{x) 7^ 0}. We say that / has connected support if Supp(/) is a connected 
subset of M. Our result is the following: 



Theorem 1.5. Let rj be any probability distribution whose p-d.f. is continuous on its support 
and whose support is connected. Let 9n be any strictly positive function of n such that nOn — ?• oo 
as n —)• oo. Then in the model (11.21). Pin) tends to one as n ^ oo. 



This result is proven in Section [3l The proof follows similar lines to that of Theorem 11.21 but 
the analysis is somewhat simpler. 

2. Results for the HoC models 

In this section, a path will always refer to a path through the directed hypercube, meaning 
that d{vi,v^) is strictly increasing along paths. For each path i from to let Xj be the 
indicator function of the event that i is accessible, and let X = - Xi denote the number of 
accessible paths from to . Furthermore, given a path i from to in the n-dimensional 
hypercube, let r(n, k) denote the number of paths from to that intersect i in exactly 
k — 1 interior nodes. 

Proposition 2.1. Let X denote the number of accessible paths in the a-HoC model. Then 
(2.1) E[X] =n(l-a)"-i. 

Proof. There are n! paths through the hypercube. A path is accessible if all n — 1 interior nodes 
have fitness at least a and the fitness of the interior nodes is increasing along the path. This 
occurs with probability (1 — a)"'~^/(n — 1)!. ■ 

Note that for a = + e„, this implies that the probability of accessible paths tends to 
for any sequence En satisfying ne„, — t- oo. Furthermore, for a = — En where ne„ — t- oo, the 
expected number of paths tends to infinity. To show that the probability of there being at least 
one accessible path tends to 1 in this case, we will begin by showing that the probability is at 
least J — o(l) by estimating the second moment of X and applying Lemma 12.21 We will then 
use a symmetry argument to show that the probability must tend to 1. 

Lemma 2.2. Let X be a random variable with finite expected value and finite and non-zero 
second moment. Then 



(2.2) P(X/0)> 



E[X2]' 



Proof. Let Ixj^o denote the indicator function of X 7^ 0. Then, by the Cauchy-Schwartz 
inequality E[X]2 = E[lx^o^]^ < H^x^o\ ' ^i^^] = + 0) ' '^\^\ ■ 

See also Exercise 4.8.1 in [ASj . 



Sometimes in the mathematical Uterature, the support of a function is defined to be the closure of this set. 
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Proposition 2.3. Let i and j be paths with exactly k — 1 common interior nodes. Then 
(2.3) E[X.X,l<i^.|Ai-^A_, 

where equality holds if the nodes where i and j differ are consecutive along the paths, i.e. if i 
and j diverges at most once. Furthermore, 



(2.4) ¥.[X'^]<^n\T{n,k) 



k=i (2n-fe-l)! 

Proof. The event that i and j are both accessible occurs if all 2n — A: — 1 interior nodes have 
fitness at least a and the fitness of the interior nodes are ordered in such a way that fitness 
increases along both paths. 

Conditioned on the event that all interior nodes have fitness at least q, all possible ways 
the fitness of the the interior nodes can be ordered are equally likely. This implies that the 
probability that both paths are accessible is (1 — a)^"~'^'~^/(2n — k — 1)1 times the number of 
ways to order the fitness of the interior nodes such that fitness increases along both paths. 

To count the number of ways this can be done we color the numbers 1, . . .2n — k — 1 in the 
following way: The number / is colored gray if the interior node with the l:th. smallest fitness 
is contained in both paths, red if it is only contained in i and blue if only in j. Note that i 
and j uniquely determines which numbers must be gray for a valid ordering, and that given any 
coloring corresponding to a valid order one can recover the order. 

Clearly, any coloring corresponding to a valid order colors half of the non-gray numbers red 
and half blue, which implies that there can be at most (^"l^*^) such orders. Furthermore, if i 
and j diverge at most once, one can always construct a valid order from such a coloring, so in 
this case there are exactly (^^I^'^) such orders. 

As the number of ordered pairs of paths that intersect in exactly k — 1 interior nodes is 
n\T{n,k), (|2.4p follows from this estimate. ■ 

Remark 2.4. The numbers T(n, k) already appear in the mathematical literature. The usual 
terminology is that T{n, k) is the number of permutations of {1, 2, . . . , n} with k components. 
An alternative terminology is that it is the number of permutations of {1, 2, . . . , n} with fc — 1 
global descents. A global descent in a permutation 7ri7r2 • • • vr^ of {1,2, ...,n} is a number 
t G [1, n — 1] such that tTj > VTj for all i < t and j > t. There is a simple 1 — 1 correspondence 
between permutations with k components and those with k — 1 global descents got by reading 
a permutation backwards. In other words, 111^2 • • • vr^ has k — 1 global descents if and only if 
TTnTTn-i • • • T^i has k compoueuts. 

There is a database of the numbers T{n, k) for small n and A;, see |02j . The book of Comtet 
|Co2j referred to at this link contains a couple of exercises and an implicit recursion formula for 
T{n,k). Comtet has also performed a detailed asymptotic analysis of the numbers T[n,l) in 
[Colj . Permutations with one component (i.e.: no global descents) are variously referred to as 
connected, indecomposable, irreducible. These seem to crop up quite a lot, see [Olj . However, 
estimates of the numbers T(n, k) for general n and k like those in Propositions 12.101 and 12.121 
below do not appear to have been obtained before. 

2.1. Useful formulas for T{n,k). 

Proposition 2.5. T{n, 1) is uniquely defined by 

n 

(2.5) n! = ^r(A;,l)(n-A;)!. 

k=l 

Proof. Given a path i through the n-hypercube, the number of paths j that intersect i for the 
first time in step k is T(k, l)(n — k)l. As any path through the hypercube intersects i for the 
first time after between 1 and n steps, the Proposition follows. ■ 
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Proposition 2.6. 

(2.6) n\(^l-o(^-^^ <T{n,l)<n\ 

Proof. By definition, T(n, 1) < n\. Using this, Proposition 12.51 implies that T{n,l) is at least 
n\-^lZlk\{n-ky. = n\-0{{n-l)\). U 

Proposition 2.7. 

(2.7) T{n,k)= Yl nsul)-...-T{sk,l) 

Sl,...Sfc>l 

Si+...Sk=n 

Proof. Given a path i, the number of paths that intersect i for the first time after si steps, for 
the second time after S2 more steps and so on up to the last time (at 11 ... 1) after n steps is 
r(si, 1) • . . . T(sfc-i, 1) • r(n — si — • • • — Sfc_i, 1). Let Sk = n — si — ■ ■ ■ — Sk-i- T{n, k) is obtained 
by summing over all possible values of si, . . . Sfc. ■ 

Proposition 2.8. For k>2, T{n, k) satisfies 

n-fc+l 

(2.8) T{n,k)= T{s,l)T{n- s,k-l). 

s=l 

Proof. It follows by induction that this sum equals the right hand side in (|2.7|) . ■ 

2.2. Upper bounds for T{n,k). 
Proposition 2.9. For any n > k > 1 

(2.9) T{n,k) <kY[in-Y'ly■Il'r 

\ ^ ^ / 

where the first sum goes over all integers si, . . . Sk-i such that Sj > 1 for all j and maxj sj < 
n-EjSj. 

Proof. Consider the formula for T{n,k) in Proposition 12.71 By symmetry, T{n,k) is at most k 
times contribution from terms where sj < Sk for j = 1, ... fc — 1. The Proposition follows by 
applying T(s, 1) < s\. ■ 

Proposition 2.10. There is a positive constant c such that for all n > k > 1 

(2.10) T{n, k) <k{n-k + i)\e<k-i)/{n-k+i) _ 

Proof. We use Proposition 12.91 and make the following approximations: 

• substitute (n — Sj)l by f3'^~'^^ where /3 = ((n — k + Jt follows from 

log-convexity of l\ that >l\ for any Q<l<n — k + \. 

• let ah Sj go from 1 to [{n - /c + l)/2 + Ij . 

This yields 

/L(«-^'+i)/2+iJ ^ 

(2.11) r(n,A;) < A:(n-A; + 1)! ^ s!/3^' 
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We now claim that this sum is always less than l + c/(n — A: + l) for sufficiently large c. 
Indeed 

[(n-fc+l)/2+lJ [(n-fc+l)/2-lJ 

s!/3^-^ = 1 + 2/3-1 + ^ ti{t + l){t + 2)f3-' 

s=l t=l 

< 1 + 2^-1 + 

L{n-fc+l)/2-lJ . , , , \ t / , , , . _t 



„ 1 ■s-^ r, , ^ f n — k + \\ f n — k + 1\ 

+ er ^ E viit + m + 2) [-^) [-^) 

oo 

< 1 + 2^-1 + e^-1 E + + 2)2" 



i=l 

< l + c{n-k + iy^. 

Here we have used that {n — k + l)/e < /3 < (n — k + l) and that n\ < en""*"i/^e-", which follows 
from standard estimates of factorials. 

The Proposition now follows from this result together with ()2.1ip . ■ 

Proposition 2.11. For any fixed I there is a constant Ci > such that 

(2.12) T{n,n-l) <Cin^ 

for all n> 1. 

Proof. We may, without loss of generality, assume that n >2l. 

Recall the formula for T{n,n—l) in Proposition [221 As si, . . . , s„_; > 1 and si + - ■ = n 

it is easy to see that all but at most / variables are 1. This implies that T{n,n — I) is at most 
("^') times the contribution from all terms where s^+i = • • • = Sn-i = 1- Using T(l, 1) = 1, we 
get 



(2.13) 



T(n,n-/)< [""^ ^] E T{s,,l).....T{si,l)<CinK 

^ ^ Sl,...Sl>l 



si-t...si=2l 



Proposition 2.12. For sufficiently large c 

(2.14) T{n,n-l) <c{l + l) 
Proof. Let 

(2.15) S{n,n-l) = (l + l) 
i.e. 

(2.16) S{n,k) = (n- k + 1) 
We will begin by showing that S{n, k) satisfies 



n + 21"-^ 



n + 2r ' 



3n-2k^ 
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(2.17) S{n,k)> E i^-S{n-i,k-l) 

i=l 
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for k > 1 and sufficiently large n — k. Here we have 

n-k+l n-k+1 /„ „t q • , r> \ n-fc-i+1 

^ i]Sin-i,k-l)= i!(n-fc + 2-i)( '^^"^^"'^' + ^ ) 

i=i j=i V 5 / 

, /3n-2A:-l\"-^ 
< n-fc+1 + 



+ ^lin-k + l)r-^) 

i=2 ^ ^ 

/ / 1 \ 71-fc W -fc 



1 \'-' "^tf' /3n-2t 



where 



1 \ I n — k 



< exp 

< exp 



— 2k J \ 2>n — 2k 

n — k 



3n 



1 n — k 

< max — , 1 

'2' 6n 



and 



1=2 

n— fc— 1 



10 5e ^ , fn-kV f3n-2k\' 

/ oo 

^ - I 10 + 5eJ^ Vj(j + l)(i + 2) 



3e 
- £ 

It follows directly that (j2.17p holds for A; > 1 and n — k > 6C. 

Now, if we can choose c so that T{n, k) < cS{n, k) for k = 1 and for n—k < 6C the Proposition 
follows from Proposition 12.81 by induction on k. Hence it suffices to show the Proposition for 
these two cases. 

For k = 1, the inequality holds for sufficiently large c by the fact that 

T{n,l) ^ nl 



5(n,l) - 



n 



n / Q \ — n+1 



— )• as n — oo. 

For n — k< 6C, this follows from Proposition 12.111 



2.3. Computing E[X^]. Pick 5 > sufficiently small. We divide the sum in (j2.4p into the 
contribution from A; < (1 — 6)n and the contribution from > (1 — 6)n. 



S"'^'"-^ '"t2n-t-l)! 

(2-18) 

= 2- "'^^"'^^ (2n-fc-l)! + 

k=l ^ ' l=Q \ ' J 

Proposition 2.13. For k constant and a = o(l) 

(2.19) n!r(n,fc) ^-V^ \ ^2i-^V(l - 

(2n — — Ij! 

Proof. A simple lower bound on T(n, A;) is the number of permutations with k components 
where all but one component contains exactly 1 element. For sufficiently large n this is given by 
kT(n — k+l,l), which by Proposition is ~ k(n — k+l)\. Furthermore, from Proposition l2.1UI 
we know that T{n, k) is most (1 + o(l)) k{n—k+l)\. Hence for constant A;, T(n, k) ~ k{n—k+l)\. 
The Proposition now follows from standard estimates of factorials. ■ 

Proposition 2.14. Let a = o(l). For any < 5 < 1, the contribution to (|2.18|) from k < 
{l-5)n is ~ 4n2(l -a)2'^. 

Proof. From Proposition 12.10] it follows that there is a constant Cs such that T{n, k) < Csk{n — 
A; + 1)! whenever k < (1 — 6)n. Using this we have 



/2n~2k\ (2n-2k\ 
\ n~k ) ^ r'l \i I ; I iM ^ n~k I 



(2,20) „!r(„, ^ < Q„:t(„ - k + 1,. ^^^^ _ ^, _ 

for all A; < (1 — (5)n. Now by extensive use of Stirling's formula there is a constant C > such 
that: 

l2n-2k\ 



(2n- A; - 1)! 

< C^Ck^n (^)" ^k (^)"" in-k^ ^) ^^l~'l -k 



V2n - k (^) 
CsCk{n -k + l) Vn(2n - A:)2^M M - - j (l-^j 



2n , 1 \ k 



where 



V 2ny - V 2n y V 2ri 

2n . 

1-5^-1 



1 - 

< I 1- 
2 



l + <5 

This means that for all 5 > there exists a constant Cg such that for k < {l — 5)n and sufficiently 
large n we have 

(2.21) n!T(n, fc) "'2;, _ ^ _ i), < C'snH^ " (1 + 5)^^ (1 - 0)"^ 



Since ^ k{l + 5) ^{1 — a) ^ converges for sufficiently small a we have shown that the con- 
tribution from k < {1 — 5)n is O — a)^"). Furthermore, if we assume that n is sufficiently 
large so that (1 + 8){1 — a) > (1 + |), then as the terms in the sum 

k=l ^ ' ^ ' 

are dominated by the terms in 

(2.23) 5^ C'sk + - 

which converges, it follows by dominated convergence together with Proposition 12.131 that 
k=i ^ ' ^ ' fc=i 



Proposition 2.15. For sufficiently small 5 > and a = o(l), the contribution to ()2.18p from 
k> {l-6)n isO(n(l-a)"). 

Proof. Using Proposition 12.121 there is a constant C such that this sum is bounded by 

^ ^ ' ^ (n + l-iy. - ^ ^ ^ V 5 ; (n + /-l)! 

< C(l - £ ni-'(/ + 1) (^^) ' 4' 

.cn(i-or-'f;a + i)(lil±^ ' 



where the last sum clearly converges for sufficiently small 6. 

■ 

Proposition 2.16. Let X be the number of accessible paths in the a-HoC model where a = 
^ — En where nSn — >• oo . Then 

(2.24) E[X2] ~ 4n2(l - a)2". 

Proof. From Proposition 12.31 together with Propositions 12.141 and 12.151 we know that 

(2.25) E[X^] < (4 + o(l)) n\l - a)^" + O (n(l - a)") , 

where one can show that n(l — a)" = o (n^(l — a)^"), provided ne„ — )■ oo. 

To derive a tight lower bound for E[X^], consider the sum of E[XjXj] over all pairs of paths 
whose number of common interior nodes, /c — 1, is at most f — 1 and that diverge at most once. 
Expressed in terms of components of permutations, for a fixed i and k, the number of paths 
j that satisfiy this equals the number of permutations with k components, where all but one 
component contains exactly 1 element. This can clearly be done in kT{n—k+l, 1) ~ k{n—k+l)\ 
ways. 

By Proposition 12.31 this yields 

(2.26) nx'] > ^ n!fcr(n - A; + 1, 1) ^ ■ 

fe=i ^ ^' 
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Proceeding in a manner similar to the proof of Proposition 12.141 we get that 
(2.27) ^^!fcr(n-fc + l,l) ^"-;;'^^_ J An\l-a) 



k=l 

which concludes the proof. 



From this proof we can observe that almost all of the contribution to E[X^] comes from pairs 
of paths we considered in the lower bound. This implies the following: 

Corollary 2.17. A 

to E[X^] from all pairs of paths that either share more than (1 — 6)n common nodes or that 
diverge more than once is o (n^(l — a)^"). 

2.4. Proof of Theorem 11.21 From Propositions 12.11 and 12.161 together with Lemma 12.21 it 
follows that, for any sequence satisfying ne„ — oo, 

. „ / Inn \ 1 
(2.28) liminfP n, £« > -• 

n-s>oo \ n / 4 

In the following Proposition, we will show that this result can be improved to any value less 
than one, implying that P (n, — En) must tend to 1. 

Proposition 2.18. Define the sequence Ck by Ci = j and Ck+i = 1 — (1 — Ck) ^1 — 

for k > 1. If a = ^ — e„ for S07716 SGQii6Tic6 Sji sdtisfyifig TiSji — )" oo , thcTi foT CLTiy 
Pin, a) > Ck-oil). 

Proof. As noted above, the case k = 1 follows directly from Propositions 12.11 and 12.161 together 
with Lemma 12.21 

Assume the Proposition is true for k. By Chebyshev's inequality, there are, with probability 
1 — o(l), two nodes, ai,a2, satisfying d{ai,v^) = d{a2,v^) = 1 that have fitness at most en/3, 
and two nodes, 61,62, satisfying d(6i,i?-^) = d{b2,v^) = 1 that have fitness at least 1 — en/3. We 
may without loss of generality assume that neither ai and bi nor 02 and 62 are antipodes. By 
assumption, the probability of accessible paths in the induced subgraph between ai and 61 is at 
least Ck — 0(1). This means that the Proposition holds for /c + 1 if the probability of accessible 
paths passing through both 02 and 62 but no node on the induced subgraph between ai and bi 
is at least ^ — o(l). 

The criterion for a path to pass through the induced subgraph between ai and 61 is that 
it must flip the bit that is 1 in ai before the in bi. This means that at least half of the 
paths passing through 02 and 62 do not pass through a node on the induced subgraph between 
oi and bi. As the distribution of accessible paths is invariant under permutations of bits, the 
probability of accessible paths of this type is at least ^ — o(l). The Proposition follows by 
induction. ■ 

2.5. Proof of Corollary 11.41 A key observation is that an alternative formulation of the a- 
HoC model is to assign fitnesses as in the CHoC model and then remove each node except 
and independently with probability a (a similar idea is used in Section [3]). More precisely, if 
we consider the nodes in the a-HoC model removed if they have fitness less than a, then these 
two models yield the same distribution of fitnesses (up to an affine transformation). Similarly, 
assigning fitnesses as in the a-HoC model and then removing each node except and 
independently with probability 5 is an equivalent formulation of the (1 — (1 — a)(l — J))-HoC 
model. 

The upper bound on X is simply Markov's inequality. We now turn to the lower bound. 
To simplify calculations we may, without loss of generality, assume that Wn = o{n£n) and that 
1 ^ Wn < e"''" for all n. Let 5n = £n — and let Y denote the number of intact accessible 
paths using the same assignment of fitnesses as for X but after removing each node except 
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and V independently with probability 5n- By assumption, we know that < 5n < so 5„ is 
always a valid probability. 

By the reasoning above, Y has the same distribution as the number of accessible paths in 
the <-HoC model where < = (1 - (1 - - Sn)) = ^ - "^^^t'"""" - o(l) + lnm„ ^ oo 
as n — >■ oo it follows from Theorem 11.21 that limn— >.oo 

P(y = 0) = 0. 

As accessible path has n — 1 interior nodes, the probability of an accessible path remaining 
intact after removing each interior node independently with probability 6n is (1 — 5^)""^. Now, 
if all accessible paths before we remove any nodes have distinct interior nodes, we know that 
the probability of no accessible paths remaining is (l — (1 — If there are paths that 

share interior nodes, it is intuitively clear that the probability of no accessible paths remaining 
must be even higher. For a complete proof of this, see for instance Theorem 8.1.1 in [ASj . This 
implies that 

(2.29) P(y = 0|X)> (l-(l-(5„)"~i)^. 

But since lim„_,ooF(>" = 0) = and (l - (1 - (5„)"-i)^ = e-(i+°(i))^""'"-^ it follows that 

e~^^"X must tend to infinity in probability. To conclude the proof we note that e~'^^"X = 
X ^ X 

e"^"/w)„ ~ E[X\/wn ■ 

Remark 2.19. Note that Proposition [236] implies that Var(X) ~ SK[X]^ for a in this regime, so 
no significant improvement on Corollarv 11.41 can be made by a naive application of Chebyshev's 
inequality. 

3. Results for the RMF model 

Let n € N and let e = En he some strictly positive function. Consider the n-dimensional 
hypercube in which and are present, and where every other vertex is present with prob- 
ability En, independently of all other vertices. Let Y = Yn^e„ denote the number of accessible 
paths from to v^, where in this model a path is accessible if Hamming distance from is 
strictly increasing and all vertices along the path are present. The following proposition may be 
well-known, as it can be interpreted in the context of site percolation on the oriented hypercube. 
However, we were not able to locate a suitable reference. 



Proposition 3.1. (i) E[Y] = n\ ■ e^^-^ 

(ii) Let n — 7- oo and suppose that ne„ — )• oo. Then Var{Y) = o(E[y]^), and hence 



(3.1) F~E[y]~^f^ 

e„ V e 

Proof. There are n! possible paths in the n-hypercube. Each path contains n — 1 interior 
vertices, each of which is present with probability e„. This proves (i). Set fi = fin '■= n!e"~^. 
Now suppose nSn — oo. Let Yi be the indicator of the event that the z:th increasing path is 
accessible, where the paths have been ordered in any way. Fix any path ig- Then, by a standard 
second moment estimate (see Section [2]), 

(3.2) Var(y)<^ + n!-^P(y,, Ay,), 

where the sum is taken over all paths j which intersect the path zq in at least one interior vertex. 
Let k be the number of intersection points. This leaves T(n, k + 1) possibilities for the path j. 
The paths Iq and j contain a total of 2n — 2 — A; different interior vertices, hence the probability 
of both being present is e'^~'^~^. Hence 

(3.3) Var(y) < + n! • T(n, fc)4"-^-^ < + /x^ • ^ 

k=2 k=2 ^-^"^ 
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Hence, since /x — )• oo when nSn — )• oo, it suffices to sliow that 



n 



T{n,k) 



(3.4) 1^^ = 0(1)- 

k=2 ^■^'^ 

We now follow the same strategy as in Section [21 but the analysis here is much simpler. Let 
6 G (0,1). We divide the sum in ()3.4p into two parts, one for A; < (1 — 5)n and the other 
for A; > (1 — 6)n. From Proposition 12.101 and Lebesgue's dominated convergence theorem, 
it follows easily that, for any 6 > 0, the sum over terms k < {1 — 6)n is bounded by (1 + 
On(l)) ^T=2 [ne ~ ^(n^) ~ Provided n£n — )• oo. Similarly, from Proposition 12.121 it 
follows that the sum over terms A: > (1 — 6)n is bounded by 



(3.5) -E(' + l) 



1 + 25 



■ nSr 



where c is an absolute constant. Since ne„ — )■ oo, the sum in (j3.5p is bounded by 1 + o(l) times 
the last term, and hence is 0((ne„)'^'^), which is in turn o(^). This proves ()3.4p and completes 
the proof of the proposition. ■ 



We now turn to the Rough Mount Fuji (RMF) model and prove Theorem 11.51 

We shall abuse notation and also use t] to denote the p.d.f. of the probability distribution under 
consideration. So suppose r] has connected support and is continuous there. Let 5 > be given. 
Then there exists a bounded, closed interval / = /<5 C Supp(r7) such that jj^ r]{x) dx > 1 — 5. 
The quantity Cr^^s = minxe/^ vi^) exists, is non-zero and, obviously, depends only on r/ and 6. 
Now let n S N and 6 = 9n > he given. Without loss of generality, we may assume that the 
interval Is has length I {Is) > dn/2. By definition of Is, with probability at least (1 — 5)^ each 
of ri{v^) and 7]{v'^) lie in Is- Let Xs^nfin be the number of accessible paths in the n-hypercube, 
where fitnesses are assigned as in ()1.2p . and conditioning on the fact that both 7]{v^) and r]{v^) 
lie in Is- We claim that, if n is sufficiently large, then Xs^nfin stochastically dominates the 
random variable in Proposition 13.11 where e„ = c^j^s " % • 

To see this, first note that, as long as l{Is) > then, for any point x G Is, there will be an 
interval Ix of length at least 6n/2, which contains x and lies entirely within Is- By assumption, 
any such interval captures at least Crj^s ■ \ of the distribution 77. For any adjacent pair {v,v') 
of vertices in the hypercube such that d{v' ,v^) = d{v,v^) + 1, if r]{v') > ri{v) — On, then v' is 
accessible from v- Assuming -qiv^) S Is, it follows that we can choose, for each layer i in the 
hypercube, an interval /j C of length 6n/2 such that any path 

(3.6) ^ Vi ^ V2 > Vn-l 

for which r/(fj) G Jj for all i = l,...,n — 1, is accessible. If n is sufficiently large, we can also 
ensure that the interval In-i contains ri{v^), so that any viable path (3.6) can definitely be 
continued to v^- The stochastic domination of by Xs^nfin ^o'^ follows. Then one just 

needs to apply Proposition 13.11 and Theorem 11.51 follows immediately. 

Remark 3.2. Suppose Supp(77) is also bounded, and that 6* is a constant, independent of n. 
Let 

(3.7) C^,6) := mill / ri{x) dx, 

l(I)=e/2, /cSupp(r/) J I 

where / denotes a closed interval. Then this minimum exists and is non-zero. It follows from 
Proposition 13.11 and the argument above that the number X = X{n) of accessible paths in this 
case satisfies 

(3.8) X>n\.C;/, 

The point is that Cr/^ G (0,1] is a constant depending only on r] and 9- 
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